AITopics

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Neural Information Processing SystemsFeb-19-2026, 11:24:50 GMT

Segment Anything in 3D with NeRFs

We refer to the proposed solution as SA3D, for Segment Anything in 3D. It is only required to provide a manual segmentation prompt ( e.g., rough points) for the target object in a single view, which is used to generate its 2D mask in this view with SAM.

machine learning, natural language, segmentation, (19 more...)

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsFeb-13-2026, 04:03:27 GMT

fe989bb038b5dcc44181255dd6913e43-Supplemental-Conference.pdf

nerf, reflection, training view, (15 more...)

Country: Asia > China > Hong Kong (0.05)

Technology: Information Technology > Artificial Intelligence > Vision (0.34)

David Novotny, Ben Graham, Jeremy Reizenstein

PerspectiveNet: A Scene-consistent Image Generator for New View Synthesis in Real Indoor Environments

Neural Information Processing SystemsFeb-12-2026, 21:12:17 GMT

Given a set of a reference RGBD views of an indoor environment, and a new viewpoint, our goal is to predict the view from that location.

artificial intelligence, machine learning, reference view, (15 more...)

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Neural Information Processing SystemsOct-10-2025, 23:14:04 GMT

Segment Anything in 3D with NeRFs

machine learning, natural language, segmentation, (19 more...)

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Oklahoma > Beaver County (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

David Novotny, Ben Graham, Jeremy Reizenstein

PerspectiveNet: A Scene-consistent Image Generator for New View Synthesis in Real Indoor Environments

Neural Information Processing SystemsOct-3-2025, 04:31:04 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reference view, (15 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)

Waithaka, John, Busogi, Moise

Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation

arXiv.org Artificial IntelligenceJul-17-2025

Semantic segmentation of satellite imagery is crucial for Earth observation applications, but remains constrained by limited labelled training data. While self-supervised pretraining methods like Masked Autoencoders (MAE) have shown promise, they focus on reconstruction rather than localisation-a fundamental aspect of segmentation tasks. We propose adapting LOCA (Location-aware), a position prediction self-supervised learning method, for multimodal satellite imagery semantic segmentation. Our approach addresses the unique challenges of satellite data by extending SatMAE's channel grouping from multispectral to multimodal data, enabling effective handling of multiple modalities, and introducing same-group attention masking to encourage cross-modal interaction during pretraining. The method uses relative patch position prediction, encouraging spatial reasoning for localisation rather than reconstruction. We evaluate our approach on the Sen1Floods11 flood mapping dataset, where it significantly outperforms existing reconstruction-based self-supervised learning methods for satellite imagery. Our results demonstrate that position prediction tasks, when properly adapted for multimodal satellite imagery, learn representations more effective for satellite image semantic segmentation than reconstruction-based approaches.

artificial intelligence, machine learning, representation, (14 more...)

2506.06852

Country:

Africa (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.54)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

arXiv.org Artificial IntelligenceApr-17-2025

DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object Reconstruction

Pan, Sicong, Jin, Liren, Huang, Xuying, Stachniss, Cyrill, Popović, Marija, Bennewitz, Maren

Many autonomous robotic applications depend on accurate 3D models of objects to perform downstream tasks. These include object manipulation in household scenarios (Breyer et al. 2022; Dengler et al. 2023; Jauhri et al. 2024), harvesting and prediction of intervention actions in agriculture (Pan et al. 2023; Lenz et al. 2024; Y ao et al. 2024), as well as solving jigsaw puzzles of fragmented frescoes in archaeology (Tsesmelis et al. 2024). For these applications, high-fidelity 3D object representations are critical to enable precise action execution and informed decision-making. When deployed in initially unknown environments, robots are often required to autonomously reconstruct 3D models of objects to understand their geometries, textures, positions, and orientations before taking action. Generating these models typically involves capturing data from multiple viewpoints using onboard sensors such as RGB or depth cameras. Data acquisition solely following predefined or randomly chosen sensor viewpoints is inefficient, as these approaches fail to adapt to the geometry and spatial distribution of the object to be reconstructed. This can lead to inferior reconstruction results, especially when objects are complex and contain self-occlusions. To address this, we propose using active reconstruction strategies, where object-specific sensor viewpoints are planned for data acquisition to achieve high-quality 3D object reconstruction. The key aspect of active reconstruction is view planning for generating viewpoints (Zeng et al. 2020a) that enables the robot to acquire the most informative sensor measurements.

artificial intelligence, machine learning, reconstruction, (17 more...)

2504.11674

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

arXiv.org Artificial IntelligenceMar-18-2025

Foundation Feature-Driven Online End-Effector Pose Estimation: A Marker-Free and Learning-Free Approach

Wu, Tianshu, Zhang, Jiyao, Liang, Shiqian, Han, Zhengxiao, Dong, Hao

Accurate transformation estimation between camera space and robot space is essential. Traditional methods using markers for hand-eye calibration require offline image collection, limiting their suitability for online self-calibration. Recent learning-based robot pose estimation methods, while advancing online calibration, struggle with cross-robot generalization and require the robot to be fully visible. This work proposes a Foundation feature-driven online End-Effector Pose Estimation (FEEPE) algorithm, characterized by its training-free and cross end-effector generalization capabilities. Inspired by the zero-shot generalization capabilities of foundation models, FEEPE leverages pre-trained visual features to estimate 2D-3D correspondences derived from the CAD model and target image, enabling 6D pose estimation via the PnP algorithm. To resolve ambiguities from partial observations and symmetry, a multi-historical key frame enhanced pose optimization algorithm is introduced, utilizing temporal information for improved accuracy. Compared to traditional hand-eye calibration, FEEPE enables marker-free online calibration. Unlike robot pose estimation, it generalizes across robots and end-effectors in a training-free manner. Extensive experiments demonstrate its superior flexibility, generalization, and performance.

artificial intelligence, pose estimation, video understanding, (14 more...)

2503.14051

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Xia, Jiatong, Liu, Lingqiao

Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training

arXiv.org Artificial IntelligenceMar-12-2025

3D Gaussian Splatting (3DGS) has demonstrated impressive performance in synthesizing novel views after training on a given set of viewpoints. However, its rendering quality deteriorates when the synthesized view deviates significantly from the training views. This decline occurs due to (1) the model's difficulty in generalizing to out-of-distribution scenarios and (2) challenges in interpolating fine details caused by substantial resolution changes and occlusions. A notable case of this limitation is close-up view generation--producing views that are significantly closer to the object than those in the training set. To tackle this issue, we propose a novel approach for close-up view generation based by progressively training the 3DGS model with self-generated data. Our solution is based on three key ideas. First, we leverage the See3D model, a recently introduced 3D-aware generative model, to enhance the details of rendered views. Second, we propose a strategy to progressively expand the ``trust regions'' of the 3DGS model and update a set of reference views for See3D. Finally, we introduce a fine-tuning strategy to carefully update the 3DGS model with training data generated from the above schemes. We further define metrics for close-up views evaluation to facilitate better research on this problem. By conducting evaluations on specifically selected scenarios for close-up views, our proposed approach demonstrates a clear advantage over competitive solutions.

close-up view, frontier view, gaussian splatting, (15 more...)

2503.09396

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)